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EMPHASIS OF SHORT-DURATION TRANSIENT SPEECH 

FEATURES 

Field of the Invention 
5 This invention relates to the processing of signals derived from sound 

stimuli, particularly for the generation of stimuli in auditory prostheses, such as 

cochlear implants and hearing aids, and in other systems requiring sound 

processing or encoding. 

Background of the Invention 
10 Various speech processing strategies have been developed for processing 

sound signals for use in stimulating auditory prostheses, such as cochlear 

prostheses and hearing aids. Such strategies focus on particular aspects of 

speech, such as formants. Other strategies rely on more general channelization 

and amplitude related selection, such as the Spectral Maxima Sound Processor 
15 (SMSP), strategy which is described in greater detail in Australian Patent No. 

657959 by the present applicant, the contents of which are incorporated herein 

by cross reference. 

A recurring difficulty with all such sound processing systems is the 

provision of adequate information to the user to enable optimal perception of 
20 speech in the sound stimulus. 

Summary of the Invention and Object 

It is an object of the present invention to provide a sound processing 

strategy to assist in perception of low-intensity short-duration speech features in 

the sound stimuli. 

25 The invention provides a sound processing device having means for 

estimating the amplitude envelope of a sound signal in a plurality of spaced 
frequency channels, means for analyzing the estimated amplitude envelopes 
over time so as to detect short-duration amplitude transitions in said envelopes, 
means for increasing the relative amplitude of said short-duration amplitude 

30 transitions, including means for determining a rate of change profile over a 
predetermined time period of said short-duration amplitude transitions, and 
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means for determining from said rate of change profile the size of an increase in 
relative amplitude applied to said transitions in said sound signal to assist in 
perception of low-intensity short-duration speech features in said signal. 

In a preferred form, the predetermined time period is about 60ms. The 
5 faster/greater the rate of change, on a logarithmic amplitude scale, of said short- 
duration amplitude transitions, the greater the increase in relative amplitude 
which is applied to said transitions. Furthermore, rate of change profiles 
corresponding to short-duration burst transitions receive a greater increase in 
relative amplitude than do profiles corresponding to onset transitions. In the 

10 present specification, a "burst transition" is understood to be a rapid increase 
followed by a rapid decrease in the amplitude envelope, while an "onset 
transition" is understood to be a rapid increase followed by a relatively constant 
level in the amplitude envelope. 

The above defined Transient Emphasis strategy has been designed in 

15 particular to assist perception of low-intensity short-duration speech features for 
the severe-to-profound hearing impaired or Cochlear implantees. These speech 
features typically consist of: i) low-intensity short-duration noise 
bursts/frication energy that accompany plosive consonants; ii) rapid transitions 
in frequency of speech formants (in particular the 2nd formant, F2) such as 

20 those that accompany articulation of plosive, nasal and other consonants. 
Improved perception of these features has been found to aid perception of some 
consonants (namely plosives and nasals) as well as overall speech perception 
when presented in competing background noise. 

The Transient Emphasis strategy is preferably applied as a front-end 

25 process to other speech processing systems, particularly but not exclusively, for 
stimulating implanted electrode arrays. The currently preferred embodiment of 
the invention is incorporated into the Spectral Maxima Sound Processor 
(SMSP) strategy, as referred to above. The combined strategy known as the 
Transient Emphasis Spectral Maxima (TESM) Sound Processor utilises the 

30 transient emphasis strategy to emphasise the SMSFs filter bank outputs prior to 
selection of the channels with the largest amplitudes. 
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As with most multi-channel speech processing systems, the input sound 
signal is divided up into a multitude of frequency channels by using a bank of 
band-pass filters. The signal envelope is then derived by rectifying and low- 
pass filtering the signal in these bands. Emphasis of short-duration transitions 
5 in the envelope signal for each channel is then carried out. This is done by: i) 
detection of short-duration (approximately 5 to 60 milliseconds) amplitude 
variations in the channel envelope typically corresponding to speech features 
such as noise bursts, formant transitions, and voice onset; and ii) increasing the 
signal gain during these periods. The gain applied is related to a function of the 
10 2 nd order derivative with respect to time of the slow- varying envelope signal (or 
some similar rule, as described below in the Description of Preferred 
Embodiment). 

During periods of steady state or relatively slow varying levels in the 
envelope signal (over a period of approximately 60ms) no gain is applied. 

15 During periods where short-duration transition in the envelope signal are 
detected, the amount of gain applied can typically vary up to about 14dB. The 
gain varies depending of the nature of the short-duration transition which can be 
classified as either of the following, i) A rapid increase followed by a decrease 
in the signal envelope (over a period of no longer than approximately 60ms). 

20 This typically corresponding to speech features such as the noise-burst in 
plosive consonant or the rapid frequency shift of a formant in a consonant-to- 
vowel or vowel-to-consonant transition, ii) A rapid increase followed by 
relatively constant level in the signal envelope which typically corresponds to 
speech features such as the onset of voicing in a vowel. Short duration speech 

25 features classified according to i) are considered to be more important to 
perception than those classified according to ii) and thus receive relatively twice 
as much gain. Note, a relatively constant level followed by a rapid decrease in 
the signal envelope which corresponds to abruption of voicing/sound receive 
little to no gain. 
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Brief Description of the Drawings 

In order that the invention may be more readily understood, one presently 
preferred embodiment of the invention will now be described with reference to 
the accompanying drawings in which: 
5 Figure 1 is a schematic representation of the signal processing applied to 

the sound signal in accordance with the present invention, and 

Figures 2 and 3 are comparative electrodograms of sound signals to show 
the effect of the invention. 
Description of Preferred Embodiment 

10 Referring to Figure 1, the presently preferred embodiment of the 

invention is described with reference to its use with the SMSP strategy. As with 
the SMSP strategy, electrical signals corresponding to sound signals received 
via a microphone 1 and pre-amplifier 2 are processed by a bank of N parallel 
filters 3 tuned to adjacent frequencies (typically N = 16). Each filter channel 

15 includes a band-pass filter 4, then a rectifier 5 and low-pass filter 6 to provide an 
estimate of the signal amplitude (envelope) in each channel. In this 
embodiment a Fast Fourier Transform (FFT) implementation of the filter bank 
is employed. The outputs of the N-channel filter bank are modified by the 
transient emphasis algorithm 7 (as described below) prior to further processing 

20 in accordance with the SMSP strategy. 

A running history, which spans a period of 60 ms, at 2.5 ms intervals, of 
the envelope signals in each channel, is maintained in a sliding buffer 8 denoted 
S„(t) where the subscript n refers to the channel number and t refers to time 
relative to the current analysis interval. This buffer is divided up into three 

25 consecutive 20 ms time windows and an estimate of the slow-varying envelope 
signal in each window is obtained by averaging across the terms in the window. 
The averaging window provides approximate equivalence to a 2 nd -order low- 
pass filter with a cut-off frequency of 45 Hz and is primarily used to smooth 
fine envelope structure, such as voicing frequency modulation, and unvoiced 

30 noise modulation. Averages from the three windows are therefore estimates of 
the past (Ep) 9, current (E c ) 10 and future (Ef) 11 slow-varying envelope signal 
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with reference to the mid-point of the buffer S n (t). The amount of additional 
gain applied is derived from a function of the slow-varying envelope estimates 
as per Eq. (1). A derivation and analysis of this function can be found in 
Appendix A. 

5 G = (2xE c -2xE p -Ef)/(E c +Ep + E f ) (1) 

The gain factor (G) 12 for each channel varies with the behaviour of the 
slow-varying envelope signals such that: (a) short-duration signals which 
consisted of a rapid rise followed by a rapid fall (over a time period of no longer 
than approximately 60 ms) in the slow-varying envelope signal produces the 

10 greatest values of G. For these types of signals, G could be expected to range 
from approximately 0 to 2. (b), The onset of long-duration signals which 
consist of a rapid rise followed by a relatively constant level in the envelope 
signal produces lower levels of G which typically range from 0 to 0.5. (c) A 
relatively steady-state or slow varying envelope signal produces negative value 

15 of G. (d) A relatively steady-state level followed by a rapid decrease in the 
envelope signal (i.e. cessation/offset of envelope energy) produces small (less 
than approximately 0.1) or negative values of G. Because negative values of G 
could arise, the result of Eq. (1) are limited at 13 such that it can never fall 
below zero as per Eq. (2). 

20 If (G < 0) then G = 0 (2) 

Another important property of Eq. (1) is that the gain factor is related to a 
function of relative differences, rather than absolute levels, in the magnitude of 
the slow-varying envelope signal. For instance, short-duration peaks in the 
slow-varying envelope signal of different peak levels but identical peak to 

25 valley ratios would be amplified by the same amount. 

The gain factors for each channel (G n ), where n denotes the channel 
number, are used to scale the original envelope signals S n (t) according to Eq.(3), 
where t m refers to the midpoint of the buffer S n (t). 

S'n(t m ) = S n (tJx(l + K n xG n ) (3) 

30 A gain modifier constant (K n ) is included at 14 for adjustment of the 

overall gain of the algorithm. In this embodiment, K n = 2 for all n. During 
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periods of little change in the envelope signal of any channel, the gain factor 
(G n ) is equal to zero and thus S' n (t m ) = SnOJ, whereas, during periods of rapid- 
change, G n could range from 0 to 2 and thus a total of 0 to 14 dB of gain could 
be applied. Note that because the gain is applied at the midpoint of the envelope 
5 signals, an overall delay of approximately 30 ms between the time from input to 
output of the transient emphasis algorithm is introduced. The modified envelope 
signals S' n (t) at 15 replaces the original envelope signals S n (t) derived from the 
filter bank and processing then continues as per the SMSP strategy. As with the 
SMSP strategy, M of the N channels of S' n (t) having the largest amplitude at a 

10 given instance in time are selected at 16 (typically M = 6). This occurs, at 
regular time intervals and for the transient emphasis strategy is typically 2.5ms. 
The M selected channels are then used to generate M electrical stimuli 17 of 
stimulus intensity and electrode number corresponding to the amplitude and 
frequency of the M selected channels (as per the SMSP strategy). These M 

15 stimuli are transmitted to the Cochlear implant 19 via a radio-frequency link 18 
and are used to activate M corresponding electrode sites. 

Because the transient emphasis algorithm is applied prior to selection of 
spectral maxima, channels containing low-intensity short-duration signals, 
which: (a) normally fall below the mapped threshold level of the speech 

20 processing system; (b) or are not selected by the SMSP strategy due to the 
presence of channels containing higher amplitude steady-state signals: are given 
a greater chance of selection due to their amplification. 

To illustrate the effect of the strategy on the coding of speech signals, 
stimulus output patterns, known as electrodograms (which are similar to 

25 spectrograms for acoustic signals), which plot stimulus intensity per channel as 
a function of time, were recorded for the SMSP and TESM strategies, and are 
shown in Figs. 2 & 3 respectively. The speech token presented in these 
recordings was /god/ and was spoken by a female speaker. The effect of the 
TESM strategy can be seen in the stimulus intensity and number of electrodes 

30 representing the noise burst energy in the initial stop Igl (point A). The onset of 
the formant energy in the vowel lol has also been emphasised slightly (point B). 
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Most importantly, stimuli representing the second formant transition from the 
vowel lol to the final stop IdJ are also higher in intensity (point C), as are those 
coding the noise burst energy in the final stop /d/ (point D). 

5 APPENDIX A: TESM Gain Factor 

To derive a function for the gain factor (G) 12 for each channel in terms 
of the slow-varying envelope signal the following criteria were used. Firstly, 
the gain factor should be related to a function of the 2 nd order derivative of the 

10 slow-varying envelope signal. The 2 nd order derivative is maximally negative 
for peaks (and maximally positive for valleys) in the slow-varying envelope 
signal and thus it should be negated; Eq. (Al). 

G - 2 x E c - E p - E f (Al) 

Secondly, for the case when the 'backward' gradient (i.e. E c - Ep) is 

15 positive but small, significant gain as per Eq. (Al) can result when E f is small 
(i.e. at the cessation (offset) of envelope energy for a long-duration signal). 
This effect is not desirable and can be minimised by reducing the backward 
gradient to near zero or less (i.e. negative) in cases when it is small. However, 
when the backward gradient is large, Eq. (Al) should hold. A simple solution is 

20 to scale E p by 2. A function for the 'modified' 2 nd order derivative is given in 
Eq. (A2). As E p approaches E c , G approaches -E f rather than E c - E f; as in Eq. 
(Al) and thus the gain factor approaches a small or negative value. However 
for E p « E c , G approaches 2 x E c - E f , which is identical to the limiting 
condition for Eq. (Al). 

25 G oc 2 x E c - 2 x E p - E f (A2) 

Thirdly, because we are interested in providing gain based on relative 
rather than absolute differences in the slow-varying envelope signal, the gain 
factor should be normalised with respect to the average level of slow-varying 
envelope signal as per Eq. (A3). The effect of the numerator in Eq. (A3) 

30 compresses the linear gain factor as defined in Eq. (A2) into a range of 0 to 2. 
The gain factor is now proportional to the modified 2 nd order derivative and 
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inversely proportional to the average level of the slow-varying envelope channel 
signal. 

G = (2xE c -2xEp-Ef)/(E c +E p + Ef) (A3) 
Finally, the gain factor according to Eq. (A3) can fall below zero when 
5 E c < E p + Ef/2. Thus, Eq. (A4) is imposed on G n so that the gain is always greater 
than or equal to zero. 

If (G < 0) then G = 0 (A4) 
An analysis of the limiting cases for the gain factor can be used to 
describe its behaviour as a function of the slow- varying envelope signal. For 
10 the limiting case when E p is much smaller than E c (i.e. during a period of rapid- 
rise in the envelope signal), Eq. (A3) reduces to: 

G=(2xE c -E f )/(E c + Ef) (A5) 
In this case, if Ef is greater than E c and approaches 2 x E c , (i.e. during a 
period of steady rise in the slow-varying envelope signal), G approaches zero. 
15 If Ef is similar to E c (i.e. at the end a period of rise for a long-duration signal), G 
is approximately 0.5. If E f is a lot smaller than E c (i.e. at the apex of a rapid-rise 
which is immediately followed by a rapid fall as is the case for short-duration 
peak in the envelope signal), G approaches 2, which is the maximum value 
possible for G. 

20 For the limiting case when E f is much smaller than E c , Eq. (A3) reduces 

to: 

G = ( 2 x E c - 2 x E p ) / ( E c + E p ) (A6) 
In this case, If E c is similar to E p (i.e. cessation/offset of envelope for a 
long-duration signal), G approaches zero. If E c is much greater than E p (i.e. at a 
25 peak in the envelope). G approaches the maximum gain of 2. 

When dealing with speech signals, intensity is typically defined to on a 
log (dB) scale. It is thus convenient to view the applied gain factor in relation to 
the gradient of the log-magnitude of the slow-varying envelope signal. Eq. (A3) 
can be expressed in terms of ratios of the slow-varying envelope signal 
30 estimates. Defining the backward magnitude ratio as R b = E c /E p and the 
forward magnitude ratio R f = E f /E c gives Eq. (A7). 
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G = ( 2 x R b - 2 - R b x R f ) / ( R b + 1 + R b x R f ) (A7) 
The forward and backward magnitude ratios are equivalent to log- 
magnitude gradients and can be as defined as the difference between log- 
magnitude terms, i.e. F g = log(E f ) - log(E c ) and B g = log(E c ) - log(E p ) 

5 respectively. The relationship between gain factor and forward and backward 
log-magnitude gradients is shown in Fig. Al. Linear gain is plotted on the 
ordinate and backward log-magnitude gradient (in dB) is plotted on the abscissa. 
The gain factor is plotted for different levels of the forward log-magnitude 
gradient in each of the curves. For any value of the forward log-magnitude 

10 gradient, the gain factor reaches some maximum when the backward log- 
magnitude gradient is approximately 40 dB. The maximum level is dependent 
on the level of the forward log-magnitude gradient. For the case where the 
forward log- magnitude gradient is 0 dB, as shown by the dotted line (i.e. at the 
end a period of rise for a long-duration signal where Ef = E c ) the maximum gain 

15 possible is 0.5. For the limiting case where the forward log-magnitude gradient 
is infinitely steep as shown by the dashed line (i.e. rapid-fall in envelope signal 
where Ef « E c ), the maximum gain possible is 2.0. The limiting case for the 
forward log-magnitude gradient is reached when its gradient is approximately - 
40 dB. 
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CLAIMS: 

1. A sound processing device having means for estimating the amplitude 
envelope of a sound signal in a plurality of spaced frequency channels, means 

5 for analyzing the estimated amplitude envelopes over time so as to detect short- 
duration amplitude transitions in said envelopes, means for increasing the 
relative* amplitude of said short-duration amplitude transitions, including means 
for determining a rate of change profile over a predetermined time period of 
said short-duration amplitude transitions, and means for determining from said 
10 rate of change profile the size of an increase in relative amplitude applied to said 
transitions in said sound signal to assist in perception of low-intensity short- 
duration speech features in said signal. 

2. The device of claim 1, wherein the predetermined time provided is 
15 approximately 60ms, and the faster/greater the rate of change, on a logarithmic 

amplitude scale, of said short-duration amplitude transitions, the greater the 
increase in relative amplitude which is applied to said transitions. 

3. The device of any preceding claim, wherein the detection of short- 
20 duration transitions in the rate of change profile causes up to about 14dB of gain 

to be applied to the sound signal, the amount of gain being selected according to 
the nature of the short-duration transition: 

(i) a rapid increase followed by a decrease in the profile (a short- 
duration burst transition) cause a gain increase of up to about 14dB; 
25 (ii) a rapid increase followed by a relatively constant level in the 

profile (an onset transition) causes a gain increase of up to about 6dB, and 

(iii) a relatively constant level followed by a rapid decrease in the 
profile causes little or no increase in gain. 
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4. The device of any preceding claim, including a microphone, a pre- 
amplifier, a bank of N parallel filters, and means for applying a transient 
emphasis algorithm to the output of the filters. 

5 5. The device of any preceding claim, wherein the signal in any filter 
channel denoted S n (t), where n denotes the filter channel number and t denotes 
time, is scaled according to: S' n (t) = S n (t) x (1 + K n x G n ), where G n is the gain 
factor for each channel and K n is a gain modifier constant equal to about 2. 

10 6. The device of claim 5, wherein a buffer maintains a history of the 
envelope signal S n (t) in each filter channel for which an estimate of the slow- 
varying envelope signal is derived by an averaging window which provides an 
appropriate equivalence to a 2 nd -order low-pass filter with a cut off frequency of 
about 45Hz causing smoothing of the fine profile structure. 

15 

7.., The device of claims 5 or 6, wherein the gain factor G n is related to a 
function of the 2 nd -order derivative of the slow- varying envelope signal in each 
filter channel. 

20 8. The device according to claims 5, 6 or 7, wherein the gain factor for each 
filter channel is derived from the function: 

G n = ( 2 x E c - 2 x E p - E f ) / ( E c + E p + E f ) 
where E c Ep and E f are estimates of the current, past and future slow-varying 
envelope signal in each filter channel. 

25 

9. The device of claim 8, wherein the additional factor gain G n applied to 
the sound signal can range from about 0 to 2 for a slow-varying envelope profile 
having a rapid rise followed by a rapid fall, about 0 to 0.5 for a profile having a 
rapid rise followed by a relatively constant level, less than 0.1 for a steady state 
30 level followed by a rapid decrease in the profile, and about 0 for a steady state 
level or slowly varying profile. 
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10. The device according to claims 8 and 9, wherein slow-varying envelope 
profiles exhibiting short-duration peaks of different peak levels but similar peak 
to valley ratios are amplified by similar amounts. 
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